Some minor edits in the tokenization blog post #3236

ariG23498 · 2025-12-18T16:00:06Z

No description provided.

pcuenca · 2025-12-18T16:19:42Z

tokenizers.md

-></iframe>
-
-Language models don't read raw text. They consume sequences of integers usually called **token IDs or input IDs**. Tokenization is the process of converting raw text into these token IDs.
+Language models don't read raw text. They consume sequences of integers usually called **token IDs or input IDs**. Tokenization is the process of converting raw text into these token IDs. (Try the tokenization playground [here](https://huggingface.co/spaces/Xenova/the-tokenizer-playground) to visualize tokenization.)


tok refactor

f3dc9f9

ariG23498 requested review from pcuenca and sergiopaniego December 18, 2025 16:00

sergiopaniego approved these changes Dec 18, 2025

View reviewed changes

sergiopaniego merged commit a2f4a62 into main Dec 18, 2025
1 check passed

sergiopaniego deleted the aritra/tok-refactor branch December 18, 2025 16:01

pcuenca reviewed Dec 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some minor edits in the tokenization blog post #3236

Some minor edits in the tokenization blog post #3236

Uh oh!

ariG23498 commented Dec 18, 2025

Uh oh!

Uh oh!

pcuenca Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Some minor edits in the tokenization blog post #3236

Some minor edits in the tokenization blog post #3236

Uh oh!

Conversation

ariG23498 commented Dec 18, 2025

Uh oh!

Uh oh!

pcuenca Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants